Search CORE

15 research outputs found

Augmenting automatic speech recognition and search models for spoken content retrieval

Author: Moriya Yasufumi
Publication venue: Dublin City University. ADAPT
Publication date: 01/11/2022
Field of study

Spoken content retrieval (SCR) is a process to provide a user with spoken documents in which the user is potentially interested. Unlike textual documents, searching through speech is not trivial due to its representation. Generally, automatic speech recognition (ASR) is used to transcribe spoken content such as user-generated videos and podcast episodes into transcripts before search operations are performed. Despite recent improvements in ASR, transcription errors can still be present in automatic transcripts. This is in particular when ASR is applied to out-of-domain data or speech with background noise. This thesis explores improvement of ASR systems and search models for enhanced SCR on user-generated spoken content. There are three topics explored in this thesis. Firstly, the use of multimodal signals for ASR is investigated. This is motivated to integrate background contexts of spoken content into ASR. Integration of visual signals and document metadata into ASR is hypothesised to produce transcripts more aligned to background contexts of speech. Secondly, the use of semi-supervised training and content genre information from metadata are exploited for ASR. This approach is motivated to mitigate the transcription errors caused by recognition of out-of-domain speech. Thirdly, the use of neural models and the model extension using N-best ASR transcripts are investigated. Using ASR N-best transcripts instead of 1-best for search models is motivated because "key terms" missed in 1-best can be present in the N-best transcripts. A series of experiments are conducted to examine those approaches to improvement of ASR systems and search models. The findings suggest that semi-supervised training bring practical improvement of ASR systems for SCR and the use of neural ranking models in particular with N-best transcripts improve the result of known-item search over the baseline BM25 model

DCU Online Research Access Service

Eyes and ears together: new task for multimodal spoken content analysis

Author: Jones Gareth J.F.
Metze Florian
Moriya Yasufumi
Sanabria Ramon
Publication venue: CEUR-WS
Publication date: 01/10/2018
Field of study

Human speech processing is often a multimodal process combining audio and visual processing. Eyes and Ears Together proposes two benchmark multimodal speech processing tasks: (1) multimodal automatic speech recognition (ASR) and (2) multimodal co-reference resolution on the spoken multimedia. These tasks are motivated by our desire to address the difficulties of ASR for multimedia spoken content. We review prior work on the integration of multimodal signals into speech processing for multimedia data, introduce a multimedia dataset for our proposed tasks, and outline these tasks

Irish Universities

DCU Online Research Access Service

Stacked Denoising Autoencoder for the Front-end of DNN-based Speech Synthesis

Author: Moriya Yasufumi
Publication venue: The University of Edinburgh
Publication date: 01/01/2015
Field of study

Edinburgh Research Archive

Similarity-Based Heterogeneous Neurons in the Context of General Observational Models

Author: Jones Gareth J.F.
Moriya Yasufumi
Publication venue
Publication date: 01/01/2002
Field of study

This paper presents a framework for processing heterogeneous information based on the construction of general observational domains, and similarity-based function calculi suitable for data mining in domains which can be described by the corresponding observational models. These calculi are intuitive, simple, and sufficiently general for classification and pattern recognition tasks. Functions in these calculi are represented by a particular kind of neuron models and their behavior is illustrated with examples from real-world domains showing their capabilities in processing heterogeneous, incomplete and fuzzy information.Ce document pr\ue9sente un cadre pour le traitement d'information h\ue9t\ue9rog\ue8ne \ue0 partir de la construction de domaines d'observation g\ue9n\ue9raux, et de calculs de fonctions de similarit\ue9 convenables pour l'extraction de donn\ue9es dans des domaines qui peuvent \ueatre d\ue9crits par les mod\ue8les d'observation correspondants. Ces calculs sont intuitifs, simples et assez g\ue9n\ue9raux pour des t\ue2ches de classification et de reconnaissance des formes. Dans ces calculs, les fonctions sont repr\ue9sent\ue9es par un genre particulier de mod\ue8les de neurones et leur comportement est illustr\ue9 par des exemples tir\ue9s de domaines du monde r\ue9el, qui montrent leurs capacit\ue9s de traitement d'informations h\ue9t\ue9rog\ue8nes, incompl\ue8tes et floues.NRC publication: Ye

NRC Publications Archive

Irish Universities

DCU Online Research Access Service

Eyes and ears together: new task for multimodal spoken content analysis

Author: Jones Gareth J.F.
Metze Florian
Moriya Yasufumi
Sanabria Ramon
Publication venue: CEUR-WS
Publication date: 01/10/2018
Field of study

Irish Universities

Infected lung bulla caused by Neisseria elongata: A case report

Author: Akihiro Ohsumi
Hiroshi Date
Kiminobu Tanizawa
Tetsuji Moriya
Yasufumi Matsumura
Publication venue: 'Elsevier BV'
Publication date: 01/01/2022
Field of study

Neisseria elongata is a rod-shaped, Gram-negative, aerobic bacterium that is part of the normal oral bacterial flora. Although previously considered a non- or low-pathogenic organism, the development of bacterial detection methods has resulted in increased reports of N. elongata infections such that it has recently been recognized as a causative agent of serious infections even in non-immune-compromised patients.A 77-year-old man with rheumatoid arthritis-associated interstitial lung disease, chronic obstructive pulmonary disease, and diabetes mellitus was diagnosed with a nodule in the left lower lobe of his lung. Thoracoscopic wedge resection was performed, and pus was discharged from the specimen. Mass spectrometry of the swab culture revealed N. elongata. The patient's postoperative course was uneventful, and he was doing well without recurrence at 13 months after surgery. Since N. elongata is an oral bacterial flora, the patient consulted a local dentist, and decayed teeth were extracted.Most of the reported cases of serious N. elongata infections have described infective endocarditis. This is the first report of infected lung bulla due to N. elongata infection, which demonstrates a new pathogenicity

Directory of Open Access Journals

Vasohibin-1 Is a Poor Prognostic Factor of Ovarian Carcinoma

Author: Koichiro Shimoya
Mitsuru Shiota
Naoki Kanomata
Rikiya Sano
Soichiro Suzuki
Takuya Moriya
Yasufumi Sato
Publication venue: 'Tohoku University Medical Press'
Publication date: 01/01/2017
Field of study

Crossref

地域の自立的発展のためのモビリティ確保に向けた施策のあり方に関する事例研究

Author: ISOGAWA Yasufumi
KOBAYASHI Kiyoshi
MORIYA Ryuichi
TAMURA Tohru
タムラトオル
五十川泰史
小林潔司
森谷隆一
田村亨
Publication venue: 土木学会
Publication date
Field of study

application/pd

Institutional Repositories DataBase (IRDB)